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ABSTRACT 

The discovery of extrasolar planets is one of the major scientific advances of the 
last two decades. Hundreds of planets have now been detected and astronomers are 
beginning to characterise their composition and physical characteristics. To do this 
requires a huge quantity of spectroscopic data most of which is not available from 
laboratory studies. The ExoMol project will offer a comprehensive solution to this 
problem by providing spectroscopic data on all the molecular transitions of impor- 
tance in the atmospheres of exoplanets. These data will be widely applicable to other 
problems and will be used for studies on cool stars, brown dwarfs and circumstellar 
environments. This paper lays out the scientific foundations of this project and reviews 
previous work in this area. 

A mixture of first principles and empirically-tuned quantum mechanical meth- 
ods will be used to compute comprehensive and very large rotation-vibration and 
rotation-vibration-electronic (rovibronic) line lists. Methodologies will be developed 
for treating larger molecules such as methane and nitric acid. ExoMol will rely on 
these developments and the use of state-of-the-art computing. 

Key words: molecular data; opacity; astronomical data bases: miscellaneous; planets 
and satellites: atmospheres; stars: low-mass 



1 INTRODUCTION 



Most information on the Universe around us has been gained by astronomers studying the spectral signatures of astronomical 
bodies. Interpreting these spectra requires access to appropriate laboratory spectroscopic data as does the construction of 
associated radiative transport and atmospheric models. For hot bodies the quantities of atomic and molecular data involved 
can be very substantial: beyond that which is easily harvested using only laboratory experiments. This problem led, for 
example, to the establishment of the Opacity Project ( The Opacity Project Team fc Seaton|1995 The Opacity Project Team 
|fc Berrington|[T994| |Seaton|[2005[ ) some 35 years ago with the explicit aim of calculating all the necessary radiative data 
involving atomic ions which could be of importance for models of (hot) stars. This project was introduced by papers laying 
the scientific ( Seaton|1987 1 and computational ( Berrington et al.|1987 l background for the project. 

Stars cooler than our own Sun have significant quantities of molecules in their outer atmospheres. These molecules have 
spectra which are, in general, much richer than those of atoms and atomic ions. They thus both dominate the spectral signature 
of the cool stars and provide their major opacity sources which, in turn, determine their atmospheric structures. There are 
even cooler objects which are neither stars nor planets, called Brown Dwarfs. These objects are largely characterised and 
classified according to the molecular features in their atmospheres. The last decade has also witnessed a rapid escallation in 
the number of planets orbiting other stars (exoplanets) that have been identified. This number is still increasing rapidly. So far 
spectroscopic studies of exoplanets are limited in the number, their wavelength coverage and, in particular, their resolution. 
However it is already apparant from those studies available, which are so far largely confined to hot gas giant planets, that 
analysing their results will place similar demands on molecular line lists to the requirements of cool stars and brown dwarfs. 
Modelling and interpreting the spectra of these objects requires data appropriate for temperatures up to about 3000 K. 

Since the first detection of sodium in an exoplanet ( Charbonneau et al.||2002 l, exoplanet spectroscopy has made rapid 
advances. However, even with a rather limited set of molecules detected in exoplanet atmospheres, there remain serious 
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Figure 1. Room temperature (T = 296 K) comparison of laboratory measured spectrum of ammonia, as taken from the HITRAN 
database, with the Hne list calculated using program "TROVE" by [Yurchenko et al.| pOOOl l. 



problems with laboratory data. For example, methane was detected in HD189733b by Swain et al. (20081, who lacked the 



necessary data to determine its quantity; even the presence of methane in other objects remains controversial ( Stevenson et al. 
2010[|Beaulieu et al.|20Tl| . 



Planets and cool stars share some common fundamental characteristics: they are faint, their radiation peaks in the infrared 
and their atmosphere is dominated by strong molecular absorbers. Modelling planetary and stellar atmospheres is difficult as 
their spectra are extremely rich in structure and their opacity is dominated by molecular absorbers, each with hundreds of 
thousands to many billions of spectral lines which may be broadened by high-pressure and temperature effects. Despite many 
attempts and some successes in the synthesis of transition lists for molecular absorbers, reliable opacities for many important 
species are still lacking. 

Determining line lists for hot molecules experimentally is difficult because of (a) the sheer volume of data (maybe billions 
of lines), (b) the difficulty in obtaining absolute line strengths in many cases, (c) the need to have assigned spectra in order 
for the correct temperature dependence to be reproduced, (d) the need for completeness, which requires a large range of 
wavelengths; even at room temperature experimental line lists are often far from complete, see Figure [l] for an example. All 
this means that a purely empirical strategy is problematic. Instead the plan is to build a reliable theoretical model for each 
molecule of importance, based on a combination of the best possible ab initio quantum mechanical treatment which is then 
validated by and, in most cases, tuned using experimental data. 

Molecular spectra, particularly for polyatomic species, rapidly become extraordinarily rich at elevated temperatures, 
meaning that the data requirement for a single triatomic molecule can outstrip the entire Opacity Project dataset. Figure 2 
illustrates the strong temperature-dependence of the spectrum of water which requires many millions of line to simulate 
at higher temperatures. Considerable effort has been expended in constructing spectroscopic databases, such as HITRAN 
I Rothman et al.||2009 l and GEISA ( Jacquinet-Husson et al.|2011 l, which provide lists of molecular transitions important at 
about 296 K. These are appropriate for modelling the atmosphere of our planet and those of the other members of our solar 
system. However the construction of accurate and complete databases for higher temperatures has been much more partial 
with most high accuracy studies concentrating on a single species. The present status of this data is reviewed below. 

This paper lays out the scientific foundations of a new project, called ExoMol, which aims to systematically provide line 
lists for molecules of key astronomical importance. These molecules have been selected to be those most likely to be present 
in the atmospheres of extra-solar planets. In practice they are of importance in many other hot astronomical environments, 
particularly brown dwarfs and cool stars. The ExoMol project aims to provide a comprehensive database for these objects 
too. The following section summarises the presently available line lists and illustrates the importance of these line lists by 
considering some of problems they have been applied to. Section 3 considers the requirements for providing comprehensive 
data. The molecules concerned are categorised on physical grounds and appropriate methodologies are suggested for each 
class of problem. Section 4 gives conclusions and perspectives. 
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Figure 2. Absorption spectra of H2O given by BT2 ( [Barber et al.|2007[ l for T = 300, 1000, and 2500 K. 



2 THE CURRENT SITUATION 

Astronomers interested in molecular line lists use a number of collected data sources. Besides HITRAN and GEISA mentioned 



above, the JPL (Pickett et al. 19981 and CDMS (Miiller et al. 20051 databases provide comprehensive molecular line lists 



for wavelengths longer than 30 pm. However these databases are aimed at the cool interstellar medium rather than hot 



sources. HITEMP in both its original (Rothman et al. 19951 and recently updated (Rothman et al. 20101 editions carries 



data appropriate for modelling molecular spectra at elevated temperatures, but only for five species. Kurucz has extended his 
well-used atomic opacity tables with data for a number of molecules ( Kurucz|2011 1 but the data for the majority of molecules 
are approximate and the list of molecules far from complete. Similarly there are partial lists of diatomic opacities provided 
by the UGAMOP database at the University of Georgia (see www.physast.uga.edu/ugamop/) and the RADEN databank at 
Moscow State University ( Hefferlin fc Kuznetsova|1999 1. The SCAN database also contains line lists for a few diatomics and 
triatomics ( |j0rgensen||1996[ ). In summary, while there are a number of sources of molecular line list data, none of them can 
be considered complete, especially for work at elevated temperatures. 



In their review of brown dwarf and very low mass star atmospheres, Allard et al. (19971 found that the majority of 
molecular opacities available were based on statistical or similarly approximate treatments. Indeed they quote no case where 
they considered the available molecular line lists to be adequate. This situation has improved somewhat since 1997; Tables [l] 
and|2] summarise what we believe to be the current situation for line lists of hot diatomic and polyatomic species respectively. 



2.1 Diatomics 



Table[T]lists diatomic line lists which are published, available and fairly complete. Thus, for example, we have omitted the HE 
line list used by Uttenthaler et al. ( 2008 1 as there is no source for this data or, indeed, any details on how it was calculated. 



Similarly the recent AlO line list of Launila & Berg (20111 contains accurate, measured line frequencies but no transition 



intensities. In addition, the MARCS model atmosphere code (Gustafsson et al. 20081 contains unpublished molecular line 
opacities for a number of diatomic species. 

It is interesting to consider some of the diatomics that have been treated. Only in a minority of cases, specifically CO, OH 
and NO which all form part of the HITEMP database ( Rot hman et al. | 2010|), have the l ine lists been constructed essentially on 
the basis of experimental data, see for example Goorvitch (19941 and Bernath & Colin (20091. A more typical and demanding 
situation is given by TiO. 



TiO is a major opacity source in cool, oxygen-rich stars (Allard et al. 19971. It is an open shell system with several 



low-lying electronic states which can absorb at near-infrared and red wavelengths, that is close to the radiation peak in a cool 
star. A number of theoretical studies provided at least partial line lists for this system ( j0rgensen ll994l[Ple^fl998} [Alvarez fc| 
Plez|1998[). At the same time there have been several detailed experimental spectroscopic studies on the system (|Gustavsson| 



et al. 1991 Simard & Hackett 1991 Amiot et al. 19951 Kaledin et al 



these studies and data from earlier laboratory spectra ( |Linton||1974 Hocking et al.||1979 Galehouse et al 



1995 



Ram et al.| 1996 1. Schwenke 



(1998 



1980 



combined 



Brandes & 



Galehouse|1985 I with state of the art ah imtio calculations to give a comprehensive TiO line list containing 37 million lines. 
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Table 1. Recommended available line lists for hot diatomic molecules. Given is the main reference, method: experimental (expt), ab 
initio (ai) or semi-empirical (semi), isotopologues other than the main one, completeness up to a given estimated temperature (T™'"'), 
number of lines in the line list {N) and the electronic source of the data, if available. 



Molecule Ref 



VIcthod 


Isotopologues 


jnmaxc 


N 


Available'' 


semi 




5200 K 


3459595 


Kurucz 


semi 




3000 K 


71591 


Kurucz 


semi 




5000 K 


1644597 


Kurucz 


expt 




4000 K 


113631 


HITEMP 


semi 




3500 K 


89970 


UGAMOP 


semi 


50Cr,53Cr,5'»Cr 


1300 K 


13824 


Bernath 


semi 


54Fe,5'^Fe,58Fe 


1600 K 


116300 


Bernath 


ai 




all 


10120 


ExoMol 


ai 


alF 


all 


8573 


ExoMol 


semi 




4000K 


3357811 


UGAMOP 


ai 




all 


18981 


ExoMol 


ai 




all 


329 


ExoMol 


semi 




1300 K 


23315 


UGAMOP 


expt 




4000 K 


41577 


HITEMP 


semi 




3000 K 


36163 


Kurucz 


expt 




4000 K 


115610 


HITEMP 


semi 




3000 K 


78286 


Kurucz 


semi 










semi 




5300 K 


1827047 


Kurucz 


semi 




1800 K 


199073 


Bernath 


semi 




6200 K 


37744499 


Kurucz 



C2 

CH 

CN 

CO 

CaH 

CrH 

FeH 

HD+ 

HeH+ 

LiCI 

LiH 

LiH+ 

MgH 

OH 

NH 

NO 

SIH 

SiO 

SiO 

TiH 

TiO 



Kurucz 


l201l| 


Kurucz 


mm 


Kurucz 


(2011 



Week ct al. 



W!2 



Burrows et aT 
Bern ath and co-workers" 
Coppola ct al. (201l]l 
Engcl ct al. (2005) 
Wee k ct al. ( 2004 
bppola et al 



Coppola et al 
Week and co-w( 



1 2011 1 
(2DTTI 



orke 



Roth man et alj ( [ioTo i 
Kurucz (2011} 
nSthman ct ah] ( |2010 i 
Kurucz (2011) 



anghoft & Bauschlicher 
Kurucz (2011) 
urrows et al. I 2005 I 



1 1993 I 



Schwenke 1 1998 ' 



iDulick et al. 


(12003 


'i 


Wende et al.| i 


Week et al.|( 


2003b 


a 


I; |Skory et al. 



p)03l; 



Week et al. 



I 2003c 



c 2^max gjjQ^jj be considered as a guide indicating the completeness of a line list in question, estimated from the maximal energy 
of the line list and the following condition on the Boltzmann factor: exp (— i?™^''/fcT™'"') = 5 X lO"'^. The latter is an empirical 
threshold that corresponds to T™='=' and _E™="' of the BT2 line list ( [Barber et al.|2006t (see Table |2]l. 'all' indicates that all 
bound-bound transitions within the ground electronic state are given. 
Data sources: 

Bernath: http:7/bernath.uwaterloo.ca/XY where XY is the chemical formula of the molecule 
CDSD databank: | ftp: / /ftp.iao^/pub/CDSD-4000, 
ExoMol project: www.exomol.com 

HITEMP: h ttp://www.cfa.harvard.edu/hitran/HITEMP.htmll 
Kurucz CDs, |http:/ /kurucz. harvard. edu/| 
UGAMOP project: http: / /www. physast.uga .edu/ugamop/] 
'air means all possible stable isotopologues. 



This line list and the corresponding TiO opacity was found to give a very good representation of the TiO absorption in cool 
stars ( Allard et al.|2000 l and is now widely used. Schwenke's TiO line list is the largest available for a diatomic by more than 
an order-of-magnitude. 

The next largest available diatomic line list is that for C2 which is one of a number of diatomics species for which line 
lists have been provided by Kurucz. Recently, however, there have several new experimental measurements on this system, 
including the characterisation of entirely new, low- lying electronic bands ( Kokkin et al.|[2"006 Joester et al.|2007 Tanabashi 



et al.|[2007 Nakajima et al.||2009 Bornhauser^^et^^ar| 20H)| 20111. This work has been accompanied by significantly improved 



ab initio electronic structure calculations ( Kokkin et al.|[2007 [Schmidt fc Bacskay||20d7l [Nakajima et al.|[2009[ [Schmidt fc[ 
Bacskay|20lT |. Given its importance, C2 is one of the species we aim to provide an updated line list for. 



The important CN radical is represented only by the 



We should note the recent experimental efforts [Ram et al 



Kurucz (20111 data in Table 1| which is somewhat approximate. 



( [2010[ ) and by |Ram fc Bernath[ ( [20TT| offering 
partial information. More work is therefore needed on this species. 

Before turning to larger molecules it is worth considering the FeH and MgH molecules. Both these molecules have been 
the subject of experimental studies which have provided partial line lists. 

In the case of FeH Dulick et al.|(2003 ) present results on rovibronic transitions within the F *A - X *A electronic band. 



The available line list is based on measured transitions which give the spectroscopic constants of rotational levels belonging to 
the w = 0, 1, 2 vibrational levels of the FeH X and F states; these are then extrapolated to v = 3 and 4, and for J {— N + S) 
values up to 50.5, where v is the vibrational, J is the total angular momentum. A'' is the rotational, and S is the spin quantum 
numbers. The line list for this band therefore consists of experimental and extrapolated term values for the 25 vibrational 



© 2012 RAS, MNRAS 000,[T|fT8| 



ExoMol: molecular line lists 5 



Table 2. Recommended available line lists for hot polyatomic molecules. All line lists are theoretical and designed to be complete up an 
estimated maximum temperature, Tmax. ^j^g number of lines in millions for the main isotopologue only. For data sources see footnote d 
to Table [il 



Molecule 



Ref 



Isotopologues 



^max 


10-'^ Af 


Available 


3000 K 


12 


ExoMol 


3000 K 


503 


ExoMol 


3000 K 


240 


ExoMol 


3100 K 






5000 K 


626 


CDSD 


1500 K 


1014 


ExoMol 



H2O 

HCN/HNC 

C3 

CO2 

NH3 



Neale et al. |1996 1 



Barber et al.V20q6|l 
ffarris et al. ( 2006 

JJrgensen et al. ( 19891)^ 

Tashkun fc Percvalovj2011 1 
'Yurchenko et al. | j2011a 



H2D+ (Sochi & Tennyson 2010) 
HDO (Voronin et al. 2010) 
Hl^CN ( ^Harris et al.^2008^ 

all 



bands with « ^ 4. The line list of Dulick et al. (2003) was verified and corrected, by scaling the Einstein A-coefficients, by 



Wende et al. (20101 using the high-resolution spectra of red dwarf star GJ 1002. Hargreaves et al. (2010) provide a line list 
for the FeH E *n - A ^11 electronic system near 1.58 /im which combined measured frequencies with ah initio calculation of 
the linestrengths. 



MgH, along with CrH, is of potential interest for measuring the presence of deuterium in brown dwarfs (Pavlenko et al. 



2008 1, the so-called deuterium te st (jBejar et al!]|1999 l. Extensive experimental studies of MgH electronic spectra have been 



performed by Week et al. (2003b I, Week et al. (2003a I and Skory et al. (2003). These have been used to compute the complete 
line list for the B' - X '^E^ system of ^^MgH. The list includes transition energies and oscillator strengths over the 
11,850 ~ 32,130 cm~^ wavenumber range, for all possible allowed transitions from the ground electronic state vibrational levels 
v" ^ 11. This list was computed using the best available ab initio potential energies and dipole transition moment function, 
with the former adjusted to account for experimental data. The status of CrH is somewhat similar to this. 

It is clear from the above that a mixed experimental and theoretical approach has been the most successful so far. Other 



experimental datasets are available, for example a very extensive study has recently been completed on NiH (Vallon et al. 
|2009[[Ross et al.|2012[ ), and these will provide an appropriate starting point for further line lists. 



2.2 Polyatomic molecules 



Table [2] summarises the available line lists for polyatomic molecules. In the case were several line lists are available, only the 
recommended one is given. 

For polyatomic molecules the main methodology has been theoretical. So far calculations have, of necessity, been performed 
piecemeal and molecule- by- molecule. For key molecules many line lists may be available; for example, at least seven lists are 
available for hot water ( Allard et al.|1994 Wattson fc Rothman|1992 Viti et al.|1997 Partridge fc Schwenke|1997 J0rgensen 



et al. 2001 Schwenke & Partridge 



different line lists (see Jones et al 



2000 



Barber et al.||2006l. Studies have shown significant differences between the use of 



( 2002 I for example) . It is clear that use of complete and spectroscopically accurate line 



lists is important both for modelling hot astronomical objects and for interpreting their spectra. 



The recent CDSD-4000 CO2 line list of Tashkun & Perevalov (20111 extended their earlier work, which was used in 



the 2010 edition of HITEMP (Rothman et al. 20101, to higher temperatures. This line list was constructed using effective 



Hamiltonians parameterised using experimental data. Carbon dioxide is a more rigid molecule that the other polyatomics 
considered in Table 2 and thus a good candidate for treatment using effective Hamiltonians. Very recent, high-temperature, 
emission experiments by [Depraz et al.| ( [2012^ suggest that the CDSD-4000 line list is indeed the best available at modelling 
high temperature spectra but there remains some work to be done on this problem. We note that a new theoretical study 
using methods closer to those advocated here has recently started ( Huang et al.|20T2 l. 

It is instructive to consider the breadth of applications of the calculated line lists, many of which could not have been 
anticipated prior to their construction. The polyatomic line lists summarised in Table [2] have all also been used to predict, 
analyse and assign laboratory spectra of the species, especially at elevated temperatures. These line lists also provide a source 



of cooling functions (Miller et al 



2010 



Coppola et al. 2011) and high-temperature partition functions (Neale & Tennyson 



1995[|Vidler fc Tennyson||2000| [Barber et al.||20'02 r which are important for a variety of astrophysics problems. In addition 



other applications can be summarised as follows. 



Hi 



Neale et al. 



(1996 1 's line list and related partition function ( Neale fc Tennyson||l995 ) have been used 



• to give all transition intensities for interpreting astronomical observations since there are no laboratory absolute intensity 
measurements for the spectrum of H;^; 

• to significantly improve models of cool white dwarfs stars ( Bergeron et al.|1997| ); 

• to resolve issues with the Jovian energy budget ( [Miller et al.|2000[ ); 
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• to probe the role of in primordial cooling ( Glover & Savin |2009| ; 

• to provide stability limits for giant extrasolar planets orbiting near their star ( Koskinen et al.|2007[); 

• to model non-thermal rotational distributions of both in the interstellar medium ( Oka & Epp 2004 1 and in storage 
ring experiments ( Krechel et al.|[2b02 Kreckel et al.|2004 l. 



Water. The BT2 line list ( [Barber et al.|2006[ ) has been used: 



• to show an imbalance between nuclear spin and rotational temperatures in cometary comae (Dello Russo et al. 
2005 1 and assign a new set of, as yet unexplained, high energy water emissions in comets ( Barber et al. |2009[ ) 



2004 



to detect and analyse water spectra in (a) Nova-like object V838 Mon ( Banerjee et al.|[2005 l, (b) atmospheres of brown 
dwarfs ( Lyubchik et al.|2007 l and (c) FU Orionis objects; 

• to calculate the refractive index of humid air in the infrared ( Mathar||2007 1 ; 

• to detect water on transiting extrasolar planets, for which it was completely instrumental ( Tinetti et al.|[2007 l as other 
available line lists did not contain good enough coverage of the many weak lines that become significant absorbers at high 
temperatures to make this detection securely; 

for high speed thermometry ( Kranendonk et al.|2007 l, tomographic ( Ma et al.|2009| imaging in gas engines and burners 



and input for models of jet engines (Lindermeir & Beier 



as input for an improved theory of line-broadening ( Bykov et al.|2008 l; 
to model water spectra in the deep atmosphere of Venus ( BaileypOOQ 



20121 



• to validate the data used in models of the earths atmosphere and in particular simulating the contribution of weak water 
transitions to the so-called water continuum ( [Chesnokova et al.|2009| . 



HCN/HNC. J0rgensen et al. (19851 showed that including HCN opacity in their model of atmospheres of cool carbon-rich 
stars caused the modelled atmosphere to expand by a factor of 5 and lowered the gas pressure of the surface layers by 1 or 
2 orders of magnitude. This finding did much to stimulate detailed work on molecular line opacities. Subsequent line lists 
( Harris et al. [20(32 20061 have treated the isomerising HCN/HNC as a single species. They have been used 

• to detect HNC in the spectra of carbon stars ( Harris et al.|2003 l; 

• to constrain C and N abundances in of AGB stars ( jMatsuura et al.|2005 l; 

• for models of the thermochemistry of HCN ( Barber et al.|2002| ); 

• to assign a particularly extensive set of hot, laboratory HCN ( Mellau||2011a l and HNC ( Mellau|2011b I spectra. 



HeH"*". This molecule had been neglected from standard models of helium- rich white dwarfs (which rather surprisingly 

(19941)). The fine list of lEngel et al.l (|2005|) has been used 



included both H^ and He J, see for example 



Standi 



for models of the white dwarf stars, particularly helium-rich white dwarfs ( Harris et al.|20 (34|; 
to study the effects of early chemistry on the cosmic ray background ( Schleicher et al.|2008| ; 



• as a starting point for calculations of end effects in the upcoming KATRIN neutrino mass measurement experiment ( Doss 
et al.|2006| ). 



To add context to the methods discussed below, it should be noted that while the Hg and HeH line lists are completely ab 
tnitio, those for water, HCN/HNC, and NH3 used laboratory measurements to improve the procedure. For water and ammonia 
this came via the use of a spectroscopically determined potential energy surface (PES), while the HCN/HNC calculations 
replaced the calculated ab initio energy levels with observed one where known. Both of these procedures, plus other methods 
discussed below, take advantage of laboratory high resolution spectra. In contrast, comparisons between dipole transitions 
intensities computed using completely ab initio dipole moment surface (DMS) and benchmark experimental studies have shown 
that intensities calculated ab initio are competitive with, and often more accurate than, laboratory intensity measurements 
even when they are available ( Asvany et al.|2007 Lodi et al.|2011 1 . 



2.3 Scope of the ExoMoI project 

A list of species that we plan to consider is given in Table[3] This list is based on current demands for models of exoplanets and 
brown dwarfs. However it is necessary to be ffexible since as the characterisation of exoplanets improves additional molecular 
line lists are likely to be required. 

Broadly speaking, it is possible to separate the spectroscopic demands for hot Jupiters, which have a reducing or hydrogen- 
rich chemistry, and super-Earth type exoplanets, which can be expected to be oxidising or oxygen-rich. In practice there will 
be other categories of exoplanets, such as warm Neptunes; however the above division should be sufficient to identify the 
species for which data are required. 
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Table 3. Exomol list of molecules: Molecular line lists available, fuller lists are given in Tables [l] and [2] and planned categorised by 
chemistry. 





Primordial 
(Metal-poor) 


Terrestrial Planets 
(Oxidising) 


Giant-Planets & Cool Stars 
(Reducing atmospheres) 


Already 
available 


H2, HD+, LiH, LIH+ 
HeH+, H+, H2D+ 


OH, CO2, O3, NO 
H2O, NH3 


H2, CN, CH, CO, CO2, TiO, YO, 
HCN/HNC, H2O, NH3 


ExoMol 


BeH 


O2, CH4, SO2 
HOOH, H2CO, HNO3 


CH4, PH3, C2, C3, HCCH, 

C2H6, C3H8, VO, O2, AlO, MgO, 

BcH, CrH, MgH, FeH, CaH, AlH, SiH, NiH, TiH 



Although the atmospheres of other exoplanets are now starting to be probed (Bean et al. 2010 BeauHeu et al. 20111, 
those exoplanets that are currently the subject of spectroscopic analysis are largely hot Jupiters. For hot Jupiters there is 
already a major demand for a high quality methane line list ( Swain et al.||2008 l and there are likely to be demands for line 
lists of other hydrogenated species such as H2S, PH3, acetylene, ethane and propane. For super-Earths, that is planets with 
rocky cores, oxygen bearing species such as ozone and oxides of nitrogen and sulphur are likely to be more important and 
also will provide bio-signatures ( Pes Marais et aL]|2002 l. Simulated remote spectra of habitable planets (Kaltenegger et al. 



20101 suggest that a variety of molecules including even nitric acid could provide a possible bio-signature. Finally it is already 



known from models of cool stars that open shell diatomics with low-lying electronic states have clear atmospheric signatures 
and can play an important role in determining the radiative transport properties of the atmosphere. So far only for TiO 
( [Schwenke [ 



1998[ ) is there a satisfactory line list available. 



Table 3] summarises the molecules that are thought likely to be important in the atmospheres of extra-solar planets and 
cools stars. They are classified according to anticipated atmospheric chemistry, since this is radically diflerent in systems 
which have no heavy elements (such as bodies formed in the very early universe), are oxygen-rich or are carbon-rich. These 
species are separated between those for which satisfactory line list, extending to high temperature, are currently available, 
and those for which line lists are needed. It is clear that, except for primordial chemistries, the to-do list is much the longer. 
Table [3] has been constructed from the literature (eg Sharp & Burrows (20071; Freedman et al. (2008 1 ) and as a result of 
extended discussions with several scientists involved directly or indirectly in characterising exoplanets. 

There are significant differences in the physical processes that need to be modelled for different molecules. These are 
addressed in the next section. Based on the underlying physics that needs to be considered, the problems can classified as (1) 
diatomics, (2) triatomics, (3) tetratomics, (4) methane and (5) larger molecules. Special techniques will be required in each 
case. A final topic will focus on the content, construction and use of the ExoMol database itself. 



3 METHODOLOGY 



Most of the molecules listed in Table[3]are chemically stable and only undergo electronic transitions in the ultraviolet. For these 
systems it is necessary to consider in detail pure rotational and vibration-rotation transitions within the ground electronic 
state. However the list also contains a number of open shell diatomic species such as C2 and FeH. These molecules undergo 
electronic transitions at near infrared or visible wavelengths; for these systems it is necessary to consider electronic transitions 
and hence excited electronic states. 

Within the Born-Oppenheimer approximation, the calculation of line lists of rotation-vibration transitions for a stable 
polyatomic molecule can essentially be broken down into the following steps: ground-state electronic structure calculations to 
give energies and electric dipoles at a series of geometries; interpolation between geometries to create PES and DMS; nuclear 
motion calculations to provide energy levels and wave functions; calculation of transition dipoles using the wave functions and 
DMS. Predicted frequencies which arise from such a purely ab initio procedure are only accurate enough for present purposes 
for electronically very simple systems. It is therefore necessary to improve these frequencies using experimental data. This 
can be done in one of three ways: 

a. A priori by tuning the PES by comparison with the results of laboratory high resolution spectroscopic studies. We have 
developed new procedures for this |Yurchenko et al.,2008, ,2011b| ) which are both efficient and retain the predictive nature of 
the underlying ab initio PES. 

b. Post hoc by replacing calculated energy levels with observed ones. As energy levels are not observed directly we will rely 



on the MARVEL inversion procedure ( Furtenbacher et al. 20071 which has been successfully used for water isotopologues 
( Tennyson et al.|2009[ pOlo| . 



c. During the calculation: since most of the error is in the vibrational not the rotational energies ( [Polyansky et al.|1997| , using 
empirical vibrational band origins can significantly improve predicted frequencies for all transitions in the band. The nuclear 
motion program TROVE ( Yurchenko et aL]|2007 l contains the facility to replace predicted band origins with empirical ones 
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Figure 3. Schematic of the general method that will be used to produce molecular line lists. 



part way through the calculation. This facility was used in constructing the recent BYTe line list for NH3 (Yurchenko et al. 



2011al. In practice all three methods will be used, often in combination. Figure |3] summarises our general methodology. 



For each line list the following components are required: 

(i) nuclear motion model, implemented in a computer program, for accurate calculations of rotation-vibration energies and 
wave functions, 

(ii) accurate PES and DMS, 

(iii) a computational procedure for intensity simulations based on the results of the nuclear motion calculations. 

It is generally accepted that ab initio DMS computed at high levels of theory provide very reasonable description of 
the intensities, better for example, than can be obtained by attempting to fit a DMS to measured intensities ( |Lynas-Gray| 
1995| ). Moreover, such properly obtained ab initio intensities are, in all but a few cases, superior to data provided by 



et al. 



experiment ( Lodi et al.|20H |. 

Ab initio PES, however, cannot deliver rotation-vibration energies with sufficiently high accuracy. It is therefore common 
to empirically refine ab initio PESs by least-squares fitting to experimental energies or frequencies to give a "spectroscopic" 
PES; such potentials can provide theoretical line positions with near-experimental accuracy. When performing such fits it 
is important to prevent the refined surface from distorting into unrealistic shapes in regions not well characterised by the 
experimental data. To this end we impose an additional constraint requiring that the refined PES remains relatively close 
to the underlying ab initio PES (Yurchenko et al. 20081. Technically this is done by simultaneous fitting of the potential 
parameters function both to the experimental (rotation-)vibration energies and to the (lower-weighted) ab initio energies 
( [Yurchenko et al.|2003[ ); this is an efficient and, as yet, not widely used procedure. 



As mentioned above, the MARVEL (Measured Active Rotation- Vibration Experimental Levels) procedure ( Furtenbacher 



et al.|2007 l) provides a rigorous protocol for extracting experimental energy levels from the observed data. So far this protocol 



has largely been applied to water and the results are only just being incorporated into line lists ( Lodi Sz Tennyson|2012 I. We 
will make extensive use of MARVEL during the ExoMol project; note that the "Active" means that the results can be updated 
as more or improved laboratory measurements become available by simply including these measurements and re-running the 
process. The original HCN/HNC line list (Harris et al. 20021 was already updated in this fashion ( Harris et al.|[2006 l and 
extended to H"CN/HN"C ( [Harris et all2008 f using empirical energy levels, although not ones obtained using MARVEL. 
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This procedure took advantage of our preferred data structure which seperates the line hsts into energy and transitions files, 
and which is discussed in Section 3.6. 

There are essentially two different approaches to the construction of the rotation-vibration Hamiltonian. Hamiltonians 
based on use of an exact kinetic energy (EKE) operator for the nuclear motion of the molecule in question expressed in terms 
of geometrically defined internal coordinates ( Tennyson &: Sutcliffe f 982) . EKE calculations are potentially very accurate, 
since the EKE approach can provide an exact (within the Born Oppenheimer approximation) evaluation of the corresponding 
matrix elements ( Tennysonl|2011 1. However, in order to achieve this high accuracy, very substantial computer resources may 
be needed. Consequently, the EKE approach is the method of choice for triatomics but its use has to be reviewed on a 
case-by-case basis for larger systems to ensure that the calculations are tractable. 

As an alternative, the kinetic energy operator can be defined as a Taylor series in some vibrational coordinates with 
expansion parameters obtained numerically ( [Yurchenko et al.|2007[ ). Such a non-EKE approach can be adopted for the larger 
molecules, some tetratomics and methane. The rotation-vibration coordinates are chosen to minimise the coupling between 
rotation and vibration. The vibrational coordinates are taken essentially as the Cartesian displacements of the nuclei from 
their positions when the molecule is at equilibrium. For molecules with large amplitude motions, such as HOOH, we can follow 
the HBJ method of Hougen et al. ( 1970 1 and expand the nuclear kinetic energy operator in terms of the small-amplitude 
vibrational coordinates from a flexible reference configuration, i.e. around the large-amplitude coordinate. In our extensions of 
the HBJ theory, the eigenvalues and eigenfunctions of the expanded Hamiltonians are determined variationally by numerical 
diagonalisation of a matrix representation of the expanded Hamiltonian. This procedure is implemented in the program 
TROVE of I Yurchenko et al.|p507 l. 

For the largest molecules to be considered, those which have five or more atoms (apart from methane), nuclear motion 
calculations will be performed using new procedures. These will be based on the use of more approximate methods, such 
as MULTIMODE ( [Carter fc Bowman] [19^1 ), DEWE ( [Matyus et"al]|2007[ ) or an adaption of TROVE, which use normal 
coordinates and which are more appropriate for fairly rigid molecules. These calculations will not be as accurate as those 
proposed for smaller systems and therefore extensive tuning to experimental data will be required. Furthermore the potentially 
huge number of lines required to reproduce a line-by-line spectrum for these species at even moderately high temperature is 
likely to be prohibitive. 

For diatomic and triatomic systems we will generate full line lists in which each rotation- vibration transition is explicitly 
calculated. For tetratomics and methane, this will be also done where computationally feasible. Otherwise vibrational band 
intensities and Honl-London factors will be used to give transition intensities, while the calculated energy levels and hence 
frequencies will be obtained from calculated, vibrationally averaged rotational constants. 

A number of steps will be undertaken to ensure the accuracy of our final line lists. For each system we will initially 
compute a less comprehensive, low-temperature line list which can be checked for reliability against available laboratory 
spectra. This step has proved to be fundamental to the success of our recent line list calculations. We note that in the case 
of methane there are extensive laboratory data, particularly in the near infrared, which have so far defied analysis. It is to 
be anticipated that our line lists will help resolve these problems as has been the case previously for Hg", water and the 
HCN/HNC system. 



3.1 Diatomics 



For many diatomics it is necessary to consider electronic transitions. Best results will rely on the availability of laboratory 
frequency measurements. Such measurements are, of course, more reliable than we can calculate but are rarely complete, 
especially for elevated temperatures. Laboratory data are available for the majority of systems such as the extensive dataset 
of NiH transition frequencies and associated energy levels that have recently become available ( Ross et al.|2012) . However, it 
is extremely difficult to construct a complete, high temperature line list only from directly measured data; for example the 
cited NiH spectra contain no usable information on transition intensities. This makes the construction of a reliable theoretical 
model essential. 

Treatment of the nuclear motion problem for diatomics case is relatively straightforward. For uncoupled electronic states. 



we will simply use the program LEVEL by Le Roy (20071. In more complicated cases, where strong coupling between different 



electronic states is important, it will be necessary to consider this coupling explicitly in the calculation. Marian ( 1995 2001 1 



has developed a practical theory for such calculations. A program to include couplings between electronic states already exists 
( jZaharova et al.|2009| but will need to be extensively generalised to cover the many different types of couplings that will be 
encountered during the project. Given this, the main issue determiiung the accuracy of the calculations is therefore one of 
obtaining reliable potential energy curves, curve couplings and transition dipoles. 

Our strategy here will be to start from high grade ah initio methods: multi-reference configuration interaction (MRCI) 
expansions based on full-valence complete active space, self-consistent-field reference states utilising large Gaussian basis sets 
such as aug-cc-pV6Z to resolve valence electron correlation effects. Core and core-valence correlations and scalar relativistic 
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energy corrections will also be added. Such features are all standard in quantum chemistry programs such as MOLPRO 
( Werner et"al]|2010[ ), Molcas ( |Aquilante et al.|2010| ) and Columbus ( Lischka et al.|2001 1. 

High quality ab initio potential energy and interstate coupling curves will provide the starting point for further refinement. 
This will take two forms: 

(i) Ab initio methods will be used to determine relativistic effects, and in particular relativistic spin-orbit and other couplings 
between nearby curves. 

(ii) Spectroscopic data will be used initially to test curves and then to refine them to give curves that reproduce observed 
spectra. Much of this can be done with program DPotFit ( Le Roy|2006 1, although this procedure may need extending to deal 
with molecules whose electronic states are strongly coupled. 



Work on diatomic line lists is actively underway and the following paper ( Tennyson fc Yurchenko|2012 1 reports line lists 
for the X ^2+ states of BeH, MgH and CaH. 



3.2 Triatomics 



As there are good line lists for the key triatomic species H2O, CO2, HCN/HNC and , relatively few triatomic line lists are 
planned. Species to be considered include H2S, C3 and SO2. The nuclear motion problem for these species will be solved using 
the EKE DVR3D triatomic code ( [Tennyson et al.]|2004[ ), which has already been extensively adapted for the requirement 
of generating large line lists. For example the algorithm to compute dipole transition intensities was both reworked and 
parallelised to cope with the requirements of these calculations. 



A very accurate spectroscopic H2S PES was constructed by Tyuterev et al. (2001 1; this will provide a good starting point 



although further work will be required to ensure that it remains reliable for higher-lying states. Some work on the role of minor 
corrections to the PES has already been performed ( jBarletta et al.|2002[ ). The DMS for H2S is less straightforward as H2S has 
a known feature that the dipole associated with the asymmetric stretch passes through zero close to the equilibrium geometry 



making the DMS very sensitive to the level of theoretical treatment used to model it. A DMS calculated by Cours et al. ( 2002 \ 



purports to deal with this problem, but in our tests has not been found to be uniformly reliable. Therefore a new higher level 
theoretical treatment will be needed to give a satisfactory solution to this problem. The procedures developed to produce an 
essentially exact DMS for water ( |Lodi et al.|2008| |2011[ ) will be used to determine the level of treatment appropriate for H2S. 
In particular, the DMS calculations will use finite differences rather than expectation values which will allow us to test the 
appropriate level at which to introduce relativistic and other "minor" effects prior to launching a full determination. 

A C3 line list was produced by'j0rgensen et al. (19891, one of the very early ones to be produced. However this line list 



is no longer accurate by modern standards. C3 is a complicated quasi-linear system with an exceptionally flat PES which 



supports many low-frequency bending modes (Spirko et al 



19971. These have been probed via electronically excited states 



using stimulated emission pumping fRohlflng & Goldsmith 1989[ [Northrup fc Sears||1990[ [Rohlfing fc Goldsmith|["l990| ) or 
laser induced flu orescence (^Rohlfing 1989 Baker et al._ 19931. Some preliminary work on constructing a new C3 line list has 



been performed ( Tennyson et al.||2007 l which was based on measurements ( Saha fc Western|2006 ) and ab initio calculations 
(Ahmed et al. 20041 performed in Bristol. A first task will be to improve our current PES by performing a new ab initio 



calculation and then tuiung to the available data. 

The third triatomic for which a line list will be computed is SO2. 
atmospheres ( Na et al.|1990 1. 



This is a known constituent of solar system planetary 



3.3 Tetratomics 



Two different codes will be used for treating the tetratomic nuclear motion problem. TROVE ( [Yurchenko et ar]|2007[ ) has 
already been used to compute an ammonia line list (Yurchenko et al. 2011a I and will be used for phosphine (PH3) and 
formaldehyde (H2CO). However this code is not appropriate for systems which probe linear geometries such as the linear 



acetylene (HCCH). For this molecule the EKE code WAVR4 ( Kozin et al.[[2004 l will be employed; indeed acetylene was one 
of the molecules the code was originally developed to treat ( Kozin et al.|2005| ). 

A preliminary acetylene line list was computed by |Urru et al. (20101. This calculation gave reasonable results for spectra 
simulated using vibrational (i.e. J = 0) wave functions and vibrational band intensities (Le Sueur et al. 19921 with the 



rotational fine structure given by vibrational-state dependent rotational constants, and intensities computed using Honl- 
London factors. We will aim to produce an improved line list based on a fully-coupled rotation-vibration calculation but first 
further work will be required on both the PES and DMS. In this context we note the recent emission spectra of hot acetylene 
obtained by [Moudens et ah] ( |2011| ). In addition we will produce a line list for HOOH which is naturally treated using the 
diatom-diatom coordinate option available in WAVR4. 

Phosphine should be amenable to the the same treatment as ammonia and has already been the subject of preliminary, 
low-temperature studies (Yurchenko et al. 2006 Ovsyannikov et al. 2008 Sousa-Silva et al. 20121. Formaldehyde is the 
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final tetratomic planned. This has lower symmetry than the other tetratomics discussed so will be computationally the 
most demanding. However a high quality, spectroscopically-determined PES has recently been developed for this molecule 
( Yachmenev et al.|2011 l which will make and excellent starting point for a line list calculation. 



3.4 Methane 



The detection of methane in exoplanet HD 189733b (Swain et al. 20081 was notable for the failure to determine the actual 



quantity due to the lack of appropriate laboratory data. Similar problems dogged studies of the impact of comet Shoemaker- 
Levy 9 with Jupiter ( Dinelli et al.|1997 l, and also the interpretation of spectra of brown dwarfs ( Homeier et al.|2003 l, where 
the desperate resort of modelling them using methane spectra taken from solar system gas giants and Titan has been used 
( [Geballe et al.|1996[ ). Methane is, of course, also an important greenhouse gas as well as being a constituent of many flames. 
There has been for some time, a clear and pressing need for a comprehensive database of methane transitions. However 
this is a seriously challenging problem involving the calculation of many billions of vibration-rotation transitions. Advances 
in computer power and nuclear motion treatments mean that it is becoming technically possible to contemplate a full and 
systematic computational solution to the methane opacity problem. 



There has already been some work in this direction. Schwcnke ( 2002 1 performed some preliminary studies with a view to 



computing a line list. More recently Warmbier et al. (2009) did compute a line list using the code MULTIMODE (^Carterfc] 
Bowman] 1998 1 . However this line list has neither the number of transitions nor the accuracy required for models of exoplanet 
spectra, or indeed the other applications anticipated here. Currently the main source of methane spectra are HITRAN 
( Rothman et al.||2009 l, which is only really appropriate for temperatures below 300 K and is then still not complete, or the 
low-resolution PNNL database ( Sharpe et al.|2004 |. There has recently been significant and coordinated experimental activity 
to try to understand methane spectra and create corresponding line lists ( [Albert et al.|2009{ [Wang et al.|2012[ ), in particular 
to aid the interpretation of Titan spectra. 

Methane is a 10 electron system like water (and ammonia); for water there are well developed procedures for obtaining 
an ultra-high accuracy PES ( Polyansky et al?]|2003 l; application of these procedures should be easier for methane since it is 
possible to use the faster coupled-clusters approaches, such as CCSD(T), instead of MRCI because of the simpler topology of 
its PES. Methane has nine degrees of vibrational freedom compared to water which only has three and will therefore require 
the calculation of significantly more geometries. However methane's symmetry reduces this number by a factor of 24 and use 
modern computers should allow us to calculate upwards of 50000 points in a few months using MOLPRO ( Werner et al.|2010 \ 
even with no frozen core, a large (6Z level) or F12 basis set ( Hill et aL||2010 1 while also including allowance for relativistic 
and adiabatic corrections. This potential can be improved using the extensive experimental datasets referenced above. 

The much more difficult steps are the calculation of the 12-dimensional rotation-vibration wave functions and the subse- 
quent calculation of all the associated transition intensities: it is to be anticipated that the final line list will comprise many 
billions of transitions. There are two possible strategies for solving this problem. In both cases the vibrational calculations 
will be performed with an upgraded and parallelised version of TROVE ( |Yurchenko et al.||2007[ ). A strong point of TROVE 
is the automatic and general treatment of symmetries; this is important not only because maximising the use of symmetry 
will help to keep the calculation tractable but also because symmetry is necessary to get nuclear spin effects correct when 
generating spectra. 

The comprehensive solution is to use TROVE to simply compute all possible vibration-rotation transitions directly. The 
calculation of highly rotationally excited states (J up to 40) and, even more so, the calculation of huge lists of dipole transitions 
are computationally demanding in the extreme. It is unclear yet, both because the necessary benchmark calculations need to 
be performed and also because of uncertainty about what computer power will be available, whether this approach will be 
completely feasible or only so in part. 

A more pragmatic, but still reliable, approach is to follow that already used for the preliminary acetylene calculations 
( Urru et al.|[20To| . Well -converged wave functions from a J = (i.e. vibration only) calculation will be used to compute (a) 
vibrational band intensities and (b) vibrational-state dependent rotational constants. The rotational constants will be used to 
generate the required energy levels and transition frequencies. Vibrational band origins will be combined with so-called Honl- 
London factors (well known for the high-symmetry methane molecule) to give the intensity of individual rotation-vibration 
transitions. 



3.5 Larger molecules 

A characteristic of hot molecules is that their spectra become very congested with many blended lines. For heavier molecules 



this limit can be reached at room temperature. Thus, for example, the HITRAN database (Rothman et al. 20091 stores 



all data on species with four or more heavy atoms (about 30 species) as cross-sections rather than fully resolved line lists. 
However cross-section data are only applicable at the temperature of the measurement, usually room temperature. This is a 
severe disadvantage which makes the data infiexible. For example, it would be very difficult to use this cross-section data in 
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atmospheric models of an earth-like planet even when there is only a relatively small temperature differential to earth, say 
for a super-earth at 350 K. 

Conversely the use of variational procedures such as the one outlined above for these heavy systems would be both 
computationally very expensive and of much lower accuracy than would be required. We therefore propose a rather more 
empirical approach to address this problem. 

The |Watson| ( |1968[ ) Hamiltonian provides a general formulation for the nuclear motion of polyatomic systems which is 
particularly well-suited for use with semi-rigid molecule. This Hamiltonian uses normal modes and, for well behaved systems it 
is possible to simplify calculations by neglecting high-order coupling terms. This is the basis of the MULTIMODE approach of 



Carter fc Bowman ( 1998 1 and DEWE ( Matyus et al. 2007 1 ; TROVE can also be adapted to work in this fashion. MULTIMODE 



has proved capable of giving reasonable results for the rotation- vibration spectra of relatively large molecules [Wang et al.| 
( 2001] ). 

Calculations based on Watson's Hamiltonian will be used to model room temperature spectra of the species of interest, 
namely HNO3, C2H4, C2H6 and C3H8. The calculations will use ab inttio potentials largely represented by expansions about 
equilibrium in terms of force constants and higher-order terms. These constants will be calculated ah initio as derivatives 
at equilibrium, empirically-determined if available or a mixture of the two. For some large amplitude modes, such the CH3 
rotations, full potentials will be used. The PES surface and, if necessary, the associated dipole moments will be systematically 
tuned to reproduce the measured room temperature cross sections for each system. Initially this will have to be done essentially 
by trial and error. However with several systems to work on it is anticipated that we should be able to develop systematic 
procedures for this tuning. 

Having developed satisfactory room temperature models for each system, calculations will be repeated at a grid of 
temperatures to give temperature-dependent cross-sections. Experimental data such as that by Lorono Gonzalez et al. (20101 
for C2H4 will be used to either confirm the model or to provide further input for an improved tuning procedure. 

The final output of this model will be cross-sections, since at higher temperatures the number of individual lines will 
simply be vast and there is little prospect of high resolution spectra of these species being fully resolved in astronomical 
observations in the near future. 



3.6 Partition functions 



Partition functions are important for models of hot molecules and not altogether straightforward to compute. Extensive 



compilations of partition functions of astrophysically important species have been made by Irwin ( 1981 1 and Sauval & Tatum 



(19841. These compilations are comprehensive but do not cover all the molecules to considered in the ExoMol project; the 



partition function values themselves could also, undoubtedly, be improved at higher temperatures. Partition functions which 
can be considered reliable over an extended temperature range have been constructed for a number of polyatomic molecules 



including H+ ( Neale fc Tennyson||l995| , water ( [Vidler fc Tennyson] |2000| , HCN/HNC ( [Barber et al.||2002| and recently 
acetylene ( Amyay et al.||20'TT l. 

In general the partition function is given by 



Z(T) 



E 



5ie 



(1) 



where C2 is the so-called second radiation constant and is appropriate when the energy, Ei, is given as a term value in cm~^, 
T is the temperature and Qi is the statistical weight factor. The statistical weight deserves a special comment. 
If the hyperfine structure to be unresolved the statistical weight factor is given by 



5, = (25 + 1) si;' (2iV. + l), 



(2) 



where S is the total electronic spin angular momentum, g[^} is the state dependent nuclear statistical weight and A''^ is the 
rotational angular momentum of the nuclei of the ith state. If the spin dependent states are resolved the statistical weight 
factor becomes 

5, = (2 J. + 1), (3) 

where Ji is total angular momentum of the ith st ate (J = N -|- S). In Eq s. ^ and ([3| the hyperfine structure is assumed to be 
unresolved. We follow the HITRAN convention ( Simeckova et al!]|2006| and include the entire nuclear statistical weight g'is 
of the molecule explicitly in Qi and hence the partition function Z{T). For example, for BeH where ^ Be a nd H have nuclear 



spins 3/2 and 1/2, respectively, gnJ = 8. If, as can be assumed for BeH (Tennyson & Yurchenko 20121, the spin-rotation 
coupling for the ground electronic state X ^E"*" is unresolved, then each rovibronic states i is assumed to be doubly (2S -I- 1) 
degenerate. According to Eq. (|2| the statistical weight factor of BeH is then given hy gi — 2 x 8 x {2Ni -I- 1) = 16 (2A^i -I- 1). 
If the spin dependent states are resolved then according to Eq. ([s]) = 8 (2Ji -I- 1). For the ^^C^^C molecule whose nuclear 
spin is 0, the statistical weights g^s of the symmetric s and antisymmetric a rotational levels are 1 and 0, respectively. In the 
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case of the X^E"^ ground electronic state of this molecule gc ~ 1 and the statistical weight factors gi are (2Ji + 1) and for 
the s and a states, respectively. 

We note that the partition function, Z can be used estimate the completeness of a line list as a function of temperature. 
This can be done by comparing the ratio of the partition function computed by summing over all lower-state energy levels 
used to compute the line list to the accurate partition function ( Neale et al.||1996 Barber et al.||2006 l. Indeed this ratio can 
even be used make corrections for the missing contribution to, for example, the cooling function ( Neale et al.||1996l ). 



3.7 The ExoMol database 

The backbone of the ExoMol database will be line lists of transitions. However the database will include a variety of associated 
and ancillary data. These will include energy levels, partition functions, cooling functions and cross sections. 

The large amount of data produced in the above calculations, both completed and anticipated, requires the development 
of strategies for data handling and distribution. Small line lists, which is likely include most diatomics, a simple line list will 
be stored. However for larger line lists we will employ a data structure which involves organising our final line list into two 
files: an energy file and a transitions file. The energy file will contain energy levels for each state combined with a number for 
its position in the file and quantum number assignments, both rigorous and approximate. The transitions file will be arranged 
in ascending frequencies and will list only the number of the upper and lower state for each transition plus the associated 
Einstein- A coefficient. This provides a very compact means of representing the data which is essential for efficient use of 
storage. However, it is likely that the transition file will still need to be split into frequency bins for ease of distribution. 

This data structure has other important advantages. It will also allow us to actively update the energy file with measured 
rotation-vibration energy, or indeed improved approximate quantum numbers, ensuring the best possible for each transition. 
Indeed 



Harris et al. 



( 2008 1 turned an H^^CN/HNi^C line list into one for H^^CN/HN"C by replacing the energy file with 



one appropriate for the ^^C; this approach relied on the not unreasonable assumption that the Einstein-A coefficients do not 
change significantly between the two isotopologues. 

The sheer volume of data contained in the line lists makes them fairly tricky to use. In practice most codes will use 
some sort of opacity or importance sampling technique to identify key transitions and to discard the rest. We have, however. 



constructed a set of zero-pressure, temperature dependent cross sections for the key species studied so far ( Hill et al.|???? 
The purpose of these is not to replace the underlying line lists, whose use will remain necessary for detailed studies and 
analysis of high resolution spectra, but to allow the effects of adding a species to a model to be quickly and efficiently tested. 
The use of these cross sections avoid the issues of handling huge line lists at the price of assuming local thermodynamic 
equilibrium (LTE) and some loss of fiexibility. The issue turning the line lists into cross sections will be discussed elsewhere 



(Hill et al.||???? 



The results of the calculations outlined above will be a comprehensive database of molecular transitions. The ExoMol 
database, see www.exomol.com, will include not only the line lists, cross sections cooling functions, partition functions and 
other data generated during the project, but will also provide access those already available. The database is web-based and 
our aim will be to integrate it into the Virtual Atomic and Molecular Data Centre (VAMDC) project ( [Dubernet et al. |2010l 



VAMDC data storage is based on the use of XSAMS. XSAMS (XML Schema for Atoms, Molecules and Solids) ( [Dubernet 
et al.|2011 l is an XML based data storage protocol which has been designed by the International Virtual Observatory Alliance 
(IVOA) to meet the needs of astronomers who wish to describe or access molecular (and other) data in distributed datasets 
world-wide. 

XSAMS is both flexible and intuitive, making data manipulation and interpretation signiflcantly easier and less error- 
prone. However the format is very verbose and in its current form it does not seem suitable for storing massive line lists 
such as individual lists with more than 10^" lines which are to be anticipated from the ExoMol project. This will require the 
development of new protocols and, presumably, adaptation of XSAMS. 

Finally, we note that recent test of for models of water spectra in hot Jupiter exoplanet ( Tinetti et al.|2012a l suggest that 



pressure broadening can have a significant influence, particularly at long wavelengths. This means that pressure broadening 
parameters, particularly those associated with collisions with H2 should also be considered for inclusion in the ExoMol database 
at some future date. 



4 CONCLUSION 

This paper lays out the scope and methodology for a new project, ExoMol, whose aim is to provide comprehensive line 
lists of molecular transition frequencies and probabilities. The major aim of this project is to provide the necessary data to 
model atmospheres and interpret spectra for exoplanets and cool stars. However it is recognised that the line lists will have 
many other applications within astrophysics and beyond. For example it is our practice not to exclude transitions from our 
lists simply because they are too high in energy to be thermally occupied. This has already led to the identiflcation of new 



© 2012 RAS, MNRAS 000,[lp8l 



14 J. Tennyson and S. N. Yurchenko 



class of very vibrationally-hot water emissions in comets (Barb er et al.|[2009 l. It is to be anticipated that such data will be 
important for assigning and modelling maser emissions from high lying or hot states. Similarly the database will be available 
for modelling what may be observable in exoplanet characterisation missions such as the proposed Exoplanet Characterisation 
Observatory (EChO) ( [Tinetti et al.|2012b[ ) or FINESSE ( |Swain|2010| space-borne telescopes. 

The ExoMol project will generate very extensive line lists. These will be documented in the present journal and deposited 
in the linked Strasbourg data repository. The line lists, and other information about the project, will also be made available 
via the ExoMol website, |www . exomol . com| 
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